feat(middleware): Model routing, PII filtering, Cloud model proxies by richiejp · Pull Request #9802 · mudler/LocalAI

richiejp · 2026-05-13T11:59:36Z

Allows analyzing requests then routing, filtering and transforming them.

Chat requests can be classified and labelled as requiring particular capabilities.
Then routed to the model which satisfies all of the capabilities. Naturally requests that require fewer capabilities can be handled by smaller specialized models. In addition the classifier chooses more capabilities the more uncertain it is, routing difficult requests to larger general purpose models.

Classification is very fast, but once requests have been classified their embeddings can be used to avoid classifying similar requests. This works by labelling the embeddings of past requests and then doing a cosine similarity search on the embeddings of new requests.

Private information can be detected, when it is found in the request, the request can be modified to redact it,
routed differently or it can be blocked.

Cloud models and a MITM proxy can be configured and take part in filtering and routing.
This allows sending easy requests to smaller local models and hard ones to cloud models.
The MITM proxy allows you to use Claude Code or Codex subscriptions (OAuth) with the PII
filter and potentially even with routing (although this is limited by the cloud providers ToS).

Routing classifies requests using a model such as ArchRouter which labels a request.
We score each request on the possible capabilities it may require and pick a model which
has all of the capabilities with scores towards the top of the distribution.

The ability to score multiple choices is an interesting feature in its own right.
It allows you to very quickly check with what probability an LLM would produce a particular
answer.

feat(routing): add billing recorder and stats backend foundation
feat(routing): expose usage stats in REST, UI, and MCP
feat(routing): add regex PII filter with REST and MCP surfaces
feat(routing): record usage end-to-end in no-auth mode
feat(routing): per-model PII gating + middleware admin page
feat(routing): rule-based intelligent router (subsystem 2 MVP)
feat(routing): streaming PII filter with buffered-emit invariant
feat(routing): PII pattern editor in model config UI
feat(routing): streaming PII filter on Anthropic /v1/messages and /v1/completions
feat(routing): cloud passthrough proxy (subsystem 4 MVP)
docs(routing): cloud passthrough proxy feature page
feat(routing): MITM proxy for subscription-auth Claude Code / Codex
feat(mitm): negotiate HTTP/2 with h1.1 fallback
refactor(cloudproxy): extract shared SSE wire helpers, trim dead state and comments
feat(import-model): add cloud-proxy templates to YAML editor
Revert "feat(import-model): add cloud-proxy templates to YAML editor"
feat(model-editor): add cloud-proxy templates to Add Model picker
feat(mitm): runtime control of listener and intercept allowlist
feat(middleware-ui): MITM proxy admin tab
refactor(mitm): simplify-pass cleanup
feat(mitm): emit proxy_connect + proxy_traffic audit events
test(mitm): cover tunneled-host event + Events tab kind filter
fix(mitm): restore listener from runtime_settings.json on restart
fix(routing): address code-review findings across pii/mitm/router
feat(middleware): per-pattern PII toggle, model-config-owned MITM hosts
refactor(store/local): extract in-process vector store library
feat(routing): KNN + LLM classifiers and per-model admission control
refactor(store): keep the vector store out of the main process
feat(backend): TokenClassify RPC + transformers NER pipeline
fix(openai): add missing auth import to chat.go
feat(pii): NER tier in the redactor
feat(middleware-ui): router template + Create routing model link
fix(model-editor): code-editor crash on structured template values
feat(model-editor): structured router-candidates editor + proxy chat usecase
fix(router-candidates): one textarea per exemplar, multi-line-safe
feat(router): KNN consumes a benchmarker-produced routing dataset
docs(router): recommend nomic-embed-text-v1.5 over Longformer
feat(routing): Score gRPC primitive, score classifier, L2 embedding cache

Big-bang squash-friendly commit covering the work since master: phases 1-7 of the cloud-proxy migration, tool-call support, plus the surrounding routing / middleware / PII / billing scaffolding this branch had been carrying. Cloud-proxy backend (backend/go/cloud-proxy/): * New gRPC backend with two modes. * Passthrough: Forward RPC shovels raw HTTP between client and upstream so the wire format is preserved byte-for-byte. * Translate: PredictRich / PredictStreamRich convert internal proto to OpenAI Chat Completions or Anthropic Messages, preserving tool calls + usage tokens through pb.Reply. * API keys resolved from api_key_env or api_key_file (mutually exclusive), never stored in YAML. gRPC interface (pkg/grpc/): * Forward bidi RPC added to Backend proto. * AIModelRich optional extension interface returning *pb.Reply so backends can surface tool_calls and usage tokens. * Fixed forwardClient.CloseSend prematurely closing the gRPC connection — caught by e2e tests. Cleanup now fires on stream end (Recv error/EOF) instead. Core integration: * IsCloudProxyBackendPassthrough hook in chat + Anthropic endpoints; legacy "proxy-*" backend prefix removed (hard cutover — nothing released). * cloudproxy.ForwardViaBackend + cloudproxy.BuildStreamFilter shared by both endpoint families. * PII filter applies to translate mode via the standard streaming pipeline; verified by e2e. Routing + middleware (carried from earlier on the branch): * Score / Rerank / Embedder / VectorStore interfaces in core/backend with Application factory methods. * Router with score classifier, depth-1 invariant, embedding cache, PII config, billing recorder. * Admission middleware, route-model dispatch, usage stamping. * MITM proxy + CA management for intercepting cloud traffic. * Middleware admin page in the React UI. Local-store backend rewrite + tests covering Set / Get / Delete / Find invariants. Llama-cpp Score concurrency guard: conflict_guard tripwire plus FLAG_SCORE/{CHAT,COMPLETION,EMBEDDINGS} validation rule in core/config. Tests: 60+ new unit tests across cloud-proxy backend, cloudproxy core glue, gRPC server + AIModelRich dispatch, config validation, and 6 e2e specs that stand up a real two-process gRPC link with fake upstreams (gaps mudler#1/mudler#2/mudler#3 from review). Docs: cloud-proxy.md, middleware.md, mitm-proxy.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Richard Palethorpe <io@richiejp.com>

`go build ./...` (and other multi-package builds that include backend/go/cloud-proxy or backend/go/local-store) writes a binary named after the package directory into the working directory. Add both names to the existing root-binary ignore block so the working tree stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

richiejp force-pushed the feat/routing-stats-backend branch 4 times, most recently from aff5af4 to 8389d96 Compare May 13, 2026 14:54

mudler added needs-review labels May 13, 2026

richiejp force-pushed the feat/routing-stats-backend branch 2 times, most recently from 99f79f4 to d8b32b7 Compare May 19, 2026 09:33

richiejp and others added 2 commits May 19, 2026 10:49

richiejp force-pushed the feat/routing-stats-backend branch from d8b32b7 to d82ad5c Compare May 19, 2026 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(middleware): Model routing, PII filtering, Cloud model proxies#9802

feat(middleware): Model routing, PII filtering, Cloud model proxies#9802
richiejp wants to merge 2 commits into
mudler:masterfrom
richiejp:feat/routing-stats-backend

richiejp commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

richiejp commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richiejp commented May 13, 2026 •

edited

Loading